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Abstract 

CELL02G0 (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties 
of a targeted protein and Its subcellular localization. Herein, we describe how this platform is used to obtain a brief or 
detailed gene ontology {GO)-type categories, including subcellular localization(s), for the queried proteins by combining the 
CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELL02G0 uses 
BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt 
KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the 
species in which the protein is found. When homologs for the query sequence have been identified, the number of terms 
found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed 
and presented as pie charts representing possible functional annotations for the queried protein. Although the 
experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially 
suggest a subcellular localization. CELL02G0 should be a useful tool for research involving complex subcellular systems 
because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific 
questions may be readily addressed. 
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Introduction 

It is generally believed that the function of a protein is related to 
its subcellular localization, because the environment of a protein 
provides part of the relevant context necessary for function. 
However, even if a subcellular localization is known, it should not 
be the only piece of acquired evidence as additional information 
concerning the protein should be helpful during the course of a 
biological study related to the protein. To obtain a global overview 
of the function(s) that an uncharacterized protein might have in 
vivo, the Gene Ontology (GO) annotations, e.g., cellular location, 
molecular function, and biological process, of homologous proteins 
[1] are often useful. To rapidly and accurately find the appropriate 
GO annotations and determine the possible relationships within a 
given set of proteins, BLAST [2] is often used to search for 
proteins with similar sequences and known functions [3] so tliat 
functional GO-category annotations can be made [4,5]. But when 
a BLAST search is not productive, advanced computational tools 
are often used to provide clues that will enable prediction of GO- 
like terms. Therefore, many programs have been developed to 



predict the function [6] and the subcellular localization of a 
targeted protein. [7,8,9, 1 0, 1 1 , 1 2, 1 3, 1 4, 1 5, 1 6, 1 7, 1 8, 1 9,20,2 1 ,22, 
23] Some of these programs provide additional information, 
e.g., protein-protein interactions [14,19] or three-dimensional 
structure comparisons [18], although most just attempt to 
determine the subcellular compartment of the targeted protein. 
Additionally, studies have found that the more similar protein 
sequences are, the greater the likelihood that proteins with similar 
sequences will be found in the same subcellular localization 
[17,22]. A hybrid approach combining machine learning and 
homology searching also can provide accurate subcellular-locali- 
zation predictions. [22] The reason why certain computational 
tools provide improved subcellular localization prediction appears 
to be that GO information [8,9,10,12,15,20,21] or a homology- 
based modular structure comparison [23] is included in the 
prediction routine. However, if homologs for the protein of interest 
are not GO annotated or if a signature(s) and sequences similar to 
that of the query protein are not found in a relevant, searched 
database, such as InterPro [24], then a prediction cannot made. 
Among the programs that do not use a homology-based approach. 
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CELLO [22,25] performs as well as one that requires a much 
larger amount of training data [26]. CELLO is easy to use and has 
a fast computational time as has been noted [27,28]. The ability of 
CELLO to identily possible subcellular localizations for targeted 
proteins is especially important for proteomic research when the 
compartments are of sp(;cial interest, but when homologs have not 
been found by BLAST or when GO annotations are few in 
number. 

Notably, a web ser\ace that conveniently provides comprehen- 
sive functional and localization annotation, and can correlate the 
two has not been available. By extending the hybrid approach 
[22], we report herein the implementation of the CELL02G0 
server (http://ceUo.life.nctu.edu.tw/cello2go/), which provides 
brief and/or detailed annotations of GO terms related to 
homologs of a query protein found by BLAST searching in 
combination with a CELLO-predicted subcellular localization(s) 
for the queried protein. In addition, CELL02G0 can be used to 
identify protein sequences and their associated GO and CELLO 
terms when query sequences are submitted in batch mode. We 
describe how BLAST in CELL02G0 collects and displays the 
available GO-based annotations of homologous sequences found 
in in-house database derived from the UniProt KnowledgeBase for 
a query protein or a set of query proteins from a wide variety of 
organisms [29], while, at the same time, CELLO in CELL02G0 
performs the same tasks for subcellular localization(s). CEL- 
L02G0 output is presented as GOOGLE-created pie graphs and 
hyperlinks, which clearly display the evidence for each annotation. 
We believe that CELL02G0 will be of assistance in future 
genomic and proteomic research because it is easy to use, and its 
results can be manipulated to provide information specific to the 
concerns of the user. 

Methods 

The flowchart for CELL02G0 is illustrated in Figure lA. If the 
species from which the sequence is derived is known, BLAST wiU 
immediately search for homologs within the corresponding sub- 
database of an in-house database(s) (see below for information 
concerning the in-house databases); if not, the entire database(s) 
can be searched. By default, all GO terms for each retrieved 
homolog are collected form the database(s) and grouped into one 
three GO categories. The first in-house database to be searched is 
derived from the UniProtKB/SwissProt, which currendy contains 
the best documented and most complete function-annotated 
sequences. The server can also search for GO annotations defined 
by InterPro if functional information is absent from the homolog 
records or for GO terms recorded in the UniProtKB/TrEMBL 
database if no homologs are found in InterPro and the 
UniProtKB/SwissProt-derived databases. Separately, CELLO 
attempts to predict a subcellular localization(s) for the query 
protein using its most recendy trained model. CELLO may also be 
implemented after the organism tyjie has been identified by 
BLAST searching, e.g., after identifying the query sequence as 
from a Gram-positive or Gram-negative bacterium. For each 
query sequence, the CELL02G0 results are displayed as Google- 
created pie charts at the output interface and represent how often 
a potential GO annotation has associated with all retrieved 
homologs and a possible localization it is in. The names of all 
retrieved protein and their functions presented in the pie charts are 
also listed on the output page. After cUcking on an ontology term 
of interest in the list below the pie charts, the retrieved proteins in 
the searched database(s) having the same ontology are shown. 

When sequences are batch inputted into CELL02G0, the 
server processes the data in the same manner as when one 



sequence is inputted at a time. The annotations for each protein 
are retained while additional sequences are processed. After 
subjecting a set of proteins - e.g., from proteomic dataset - with 
various functions to CELL02G0, the output GO annotations and 
CELLO-identified subcellular localizations of the inputted se- 
quences are displayed as pie charts allowing the user to visualize 
how many GO annotations and subcellular localization are 
associated with the inputted sequence set. The name of the 
proteins associated with each corresponding sequence and its 
annotations to be calculated for the pie chart are listed in the same 
page, too. The inputted serjuenccs in a set that share common 
ontology features can be grouped by selecting a single ontology 
term in the hst one at a time. When the GO annotations of one 
sequence in a set of input sequences are of interest, by clicking on 
its number in the first column of the output list, its GO annotations 
are displayed in detail, the pie charts are recreated to reflect the 
GO annotations of only that sequence, and the hst below the pie 
charts now reflects the sequences homologous to the input 
sequence of interest according to their BLAST-retrieved Uni- 
ProtKB/SwissProt entry identifiers, gene names, and associated 
GO annotations, in the order of their E-values. Shown above the 
"Ontology Results" caption is a button labeled "GO detail" that, 
when clicked, allows the user to switch between detail GO and 
GO-slim terms. 

Background Databases 

To focus the BLAST search on sequences from similar 
organisms and to accelerate data processing, we prepared, in 
April, 2013, a databases of all non-redundant proteins from the 
UniProtKB/SwissProt database that contained 539616 protein 
records (separated into 16316 viral, 18993 archaeal, 328774 
bacterial, and 175533 eukaryotic sequences) and a database from 
the UniProtKB/TrEMBL database (32051161 protein records, 
separated into 1599881 viral, 428746 archaeal, 22935705 
bacterial, and 7086829 eukaryotic sequences). We formatted and 
indexed these sequences so that the user needs to BLAST search 
only the appropriate sub-database when the species for the 
sequence(s) is known. AH fundamental information for the in- 
housed databases was formatted as a MySQL database. In the 
single sequence mode, after the query sequence has been 
compared by BLAST with those in the user-selected sub-database, 
homologous sequences are returned if their E-value is the same as 
or smaller than a user-specified threshold (default E-value is 
0.001), and at the same time the GO terms are retrieved 
automatically for the homologous entries, which is the most time- 
consuming step for a multiple input sequence submission. 

For homologs, their GO terms are subdivided into molecular 
functions, biological processes, and cellular components and the 
number of terms found in each category is summed. GO terms are 
also summed as their simplified/ generalized forms, the GO slims 
[30], for more robust or other specific problems. 

Generation of GO slims 

Even through the UniProtKB/SwissProt database contains the 
most detailed information available in any database, the amount of 
information differs for each entry, and this difference in 

information contc'nt is reflc'ctc'd in tree-like GO constructions of 
the categories, i.e., the more data we have, the better developed 
the trees. When we would like to just scrutinize and obtain an 
overview of a GO hierarchy, the generic GO-slim categories 
(http://www.geneontology.org/GO.slims.shtml), which are not 
species specific, are suitable for this task. For the output, the 
GO slims were manipulated by tracing back to the ontological 
roots of the proteins using the GO terms in the UniProtKB/ 
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Figure 1. Flowchart for CELL02G0 and examples of the input and output interfaces. (A) The flowchart for annotation of a protein 
sequence used by CELL02G0. The search databases used in the worl< are modified forms of the UniProtKB/SwissProt and UniProtKB/TrEMBL 
databases. (B) The CELL02G0 output page for a multiple-sequence query, which provides four pie charts, one for the localization predictions 
returned by CELLO (upper right) and three for the GO terms returned by BLAST for each query sequence. The list, which can be hidden, below the pie 
charts presents the CELLO-predicted subcellular localizations and the associated GO annotations in the order that the sequences were submitted. (C) 
The CELL02G0 output page for a single sequence query, which provides four pie charts, one for the CELLO-predicted subcellular localizations (upper 
right) and three for the GO terms returned by BLAST for the retrieved homologous sequences. The list, which can be hidden, below the pie graphs 
presents the CELLO-predicted subcellular localization(s) and the associated GO annotations in the order that the homologous sequences were found 
by BLAST. (D) By clicking on the GO-term list in (B), a new list of submitted sequence entries with the same GO term is returned. 
doi:1 0.1 371 /journal.pone.0099368.g001 



SwissProt database. For example, for a functional annotation of 
entry P27989, we can trace the path from the deepest GO term, 
"nickel cation binding" to its root by passing through the GO 
terms "transition-metal-ion binding," "cation binding," "ion 
binding," and "binding." In this case, only the GO term "ion 
binding" is retained and denoted as the GO slim term. 
CELL02G0 counts all traced GO slims as general GO 
annotations. 

Subcellular Localizatioin prediction 

To complement incomplete annotations in the background 
database, a homology-ontology annotation retrieved by BLAST 
should be accompanied by an accurate subcellular localization 
prediction for each homologous sequence. CELLO has been 
shown to be helpful for the prediction of subcellular locahzations 
of the proteins found in a proteomic data. [28] Using multiple, 
integrated machine-learned classifiers, CELLO predicts which of 
four subcellular localizations in archaea and in Gram-positive 
bacteria, five subcellular localizations in Gram-negative bacteria, 
and twelve subcellular localizations in eukaryotes that the targeted 
protein might be found in, with the four archaeal and Gram- 
positive bacterial localizations being the extracellular space, the 
cell wall, the cytoplasmic membrane, and the cytoplasm; the five 



Gram-positive bacterial localizations being the extracellular space, 
the outer membrane, the periplasmic and cytoplasmic (inner) 
membranes, and the cytoplasm; and the 12 eukaryotic localiza- 
tions being chloroplasts, the cytoplasm, the cytoskeleton, the 
endoplasmic reticulum, the extracellular/secretory space, the 
Golgi, lysosomes, mitochondria, the nucleus, peroxisomes, the 
plasma membrane, and vacuoles. Due to subcellular data 
increased exponentially over the years, CELLO has been trained 
on latest models and denoted as update version wrapping in 
CELL02G0. And the resultant datasets used for prediction and 
evaluation is from PSORTbS.O [23]. 

Evaluation measure 

CELL02G0 is not meant for prediction of a protein's 
function(s), but for correlating one protein with another through 
the same functional annotation. To achieve this goal, it is 
necessary to obtain as many functional annotations as possible. 
Retrieved GO annotations are retained for outputted sequences 
similar to that of the query protein. Even when dealing with 
multidomain proteins, BLAST, which uses a local alignment 
approach, can easily find all similar sequences in the database(s) 
with their functional annotations provided as output. It is very 
important to functionally annotate each protein in the output set 
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even for those proteins that are multifunction and/or promiscu- 
ous, so that the CELLO output complements any incomplete GO 
cellular-component ontology annotations. For our purposes, we 
treated the CELL02G0 results for a given sequence in our 
example (see below) as correct if any collected GO-slim cellular- 
component annotation(s) was also correct. 

To validate that CELL02G0 can correcdy identify the 
subcellular localization of a query protein, we used the archaea, 
and the bacterial Gram-positive and Gram-negative benchmark 
datasets found in PSORTbS.O [23], which we denoted PSSOArch, 
PS30GP, and PS30GN, respecti^-ely. We also used the newly 
documented Gram-negative Pseudomoms aeruffnosa PAOl genome/ 
proteome sequence dataset [31] (http://www.pseudomonas.com/), 
which contains, in part, hypothetical and uncharacterized proteins 
that can be difficult to functionally annotate because homologs or 
useful GO annotations would be are missing in the UniProtKB/ 
SwissProt/TrEMBL databases. 

We then ascertained if for a given protein its subcellular 
localization(s) found by CELLO and BLAST (defined as a GO 
slim(s) agreed. For example, if a protein was assigned the GO-slim 
terms "external encapsulating structure", "extracellular region" or 
"extracellular space", then the associated CELLO term would be 
"extracellular". And the GO-slim term "plasma membrane" 
associated with CELLO terms "outer membrane" and "inner 
membrane", the GO-slim terms "cell" and "intracellular" 
associated with CELLO term "periplasmic", and the GO-slim 
term "cytoplasm" associated with CELLO term "cytoplasmic", 
respectively. Because CELL02G0 uses a hybrid procedure [22], 
CELL02G0 identifies potential subcellular localization of the 
query protein using the GO cellular-component annotation of 
homologous sequences retrieved by BLAST along with other GO 
annotations and/ or the CELLO-predicted localization(s) if 
BLAST-retrieved sequences are not associated with a GO cellular 
component annotation or if homologs are not found. We 
calculated the prediction accuracy, Qj, which is defined as 
Ql=Ci/ni xlOO, to assess the performance of the CELLO 
prediction, where C; is the number of correct CELLO predictions 
for the localization i (e.g., one of the five Gram-negative bacterial 
localizations), and is the number of sequences. The overall 
accuracy is given by 



i 

where fi = ni/N, and jVis the total number of sequences. 

Web Server Description 

The web pages for CELL02G0 are shown in Figure IB-D. 
Starting at the homepage, the user can paste or upload a protein 
sequence or a set of sequences in FASTA format into the text box. 
The "BLAST search in" option allows the user to limit the 
sequence to that from a specific organism. For precise annotation 
of the query sequence, the "E-value" field allows the user to 
change the threshold value of the retrieved homologs. As noted 
above, after the protein sequence has been inputted, CELL02G0 
will return four Google-created pie charts: one containing the 
frequencies of CELLO-predicted localizations and one for each of 
the three GO annotations, which allows the user to readily 
visualize the important GO annotation and possible subcellular 
localizations for the query protein. Taking a multiple sequence set 
as an example, CELL02G0 returns four pie charts for each 
ontology (with each associated ontology reported as a percentage) 



found for the inputted proteins (Figure IB). The user can check the 
details by clicking on the number associated with the protein in the 
table list that appears below the pie charts. When a single sequence 
is inputted, the output is also displayed as four pie charts but these 
charts report how often a GO term in an ontology is found in the 
set of outputted homologs as a percentage (Figure IC). 

Results and Discussion 

We first calculated and present in Figure 2 the statistic 
distributions for the GO-slim molecular functions (Figure 2A) 
and biological process (Figure 2B) in relation to their GO cellular 
components for all bacteria sequences found in the UniProtKB/ 
SwissProt database. Despite the amount of bias in the database, 
the relationships between the functional annotation and subcellu- 
lar localizations are clearly seen. For example, proteins with an 
"RNA binding" as the associated molecular function GO term are 
usually found in the cytoplasm or are associated with ribosomes. 
Very few RNA-binding proteins are found associated with the 
plasma membrane and hardly any are extracellular. Although 
most proteins function in the cytoplasm (i.e., the GO slim 
categories, cytoplasm, cytosol, and ribosome), other proteins are 
found elsewhere, such as those with "transmembrane transporter 
activities" and "ATPase activities", which are associated mainly 
with plasma membranes. Conversely, the relationships for 
biological prcx esses and subcellular locaUzations are spread more 
widely through Fig. 2 A than are those of molecular functions and 
subcellular localizations. When homologous proteins with the 
same biological process are found by CELL02G0 in the same 
locaUzation, the results may help determine if the proteins interact 
or participate in the same pathway. When the protein of interest is 
found to have a function that is associated with different 
subcellular localizations, as is the case for certain multifunctional 
proteins [32], it may be difficult to correlate its correct 
localizations with its most likely function via examination of the 
statistical distributions of molecular function/biological processes 
vs. localization. It is very important to understand a protein 
functioning from all of its restricted ontology. For example, for the 
bifunctional protein PuA from Gram negative bacterium Esche- 
richia coli (UniProtKB/SwissProt entry P09546) and the multi- 
functional protein ThiED from Gram positive bacterium Coryne- 
bacterium efflciens (UniProtKB/SwissProt entry Q8FTH8), 
CELL02G0 comprehensively and accurately found their GO 
annotations and made correct subcellular localization prediction. 

The overall accuracy for subcellular-localization predictions 
achieved by CELL02G0 are 99.1% for the Gram-negative 
bactericd, 99.4% for the Gram-positive bacterial, and 98.4% for 
the archaeal sequences. Notably, for >50% of the sequences with 
no GO cellular-component annotation, CELLO was able to 
correcdy predict their localizations. Table 1 contains a summary of 
the GO-annotation coverage correlated with the five subcellular 
locaUzations for the Gram-negative bacterial sequences in the 
PS30GN dataset and the accuracy of CELLO predictions when 
cellular-component annotations were missing from the BLAST 
search. The UniProtKB/SwissProt and UniProtKB /TrEMBL 
databases were separately searched for the three GO annotations 
for each query. For the PS30GN dataset, which contains well 
annotated localizations, BLAST easily found annotated homologs 
for most queries. For the extracellular proteins in the PS30GN 
dataset, ~ 7 % could not be associated with a homolog that had a 
GO cellular-component annotation, whereas for the proteins in 
the other four localizations, all but < 1 .5% had homologs with GO 
ceUular component annotations. 
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Figure 2. The frequency distributions for the GO slim of the UniProtKB/SwissProt entries in the in-house database. (A) Molecular 
function (x axis) verses cellular component (y axis). (B) Biological process (x axis) verses cellular component (y axis). The size of each sphere is 
proportional to the number of entries. 
doi:1 0.1 371 /journal.pone.0099368.g002 



We also document, in Table 2, the CELL02G0 results for the 
experimentally derived Gram-negative bacterium, Pseudomonas 
aeruginosa PAOl, proteome dataset. At least 30% of the annotations 
are missing for each ontology. The BLAST search did not find a 
homologous sequence for one-third of the sequences that could 
then be used to annotate molecular functions and biological 
processes of the input proteins. However, CELLO increased the 
number of localization predictions. The same with PSORTbS.O, 
we assess the 171 proteins of Pseudomonas aeruginosa PAO 1 , which all 
of them have been ensured in cytoplasmic location experimentally 
with high confidence [23], and the CELLO prediction alone 
reaches the prediction recall and precision both 96.5%, which 
performs almost 5% better than PSORTbS.O does. Although the 
number of sequences in the in-house UniProtKB/TrEMBL 
database is ~ 60-fold larger than that in the in-house Uni- 



ProtKB/SwissProt database, the search of the in-house Uni- 
ProtKb/TrEMBL database did not annotate many of the 
sequence not already annotated by the in-house UniProtKB/ 
SwissProt database. Given this observation, the more reliable 
annotations found in UniProtKB/SwissProt-derived database and 
the additional computational time required to search the 
UniProtKB /TrEMBL-derived database, the CELL02G0 default 
setting searches the UniProtKB/SwissProt-derived database. The 
CELL02G0 results for the PS30GP and PS30Arch dataset 
(Table 3 and Table 4, respectively) are presented in the same 
manner as those for the PS30GN dataset found in Table 1. Similar 
trends are seen in Tables 1, 2, and 3. 

To show how the CELL02G0 results can be conveniently 
correlated, we provide Fig. IB as an example, which displays the 
results for the 419 PS30GN extracellular proteins that had been 
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submitted. The plus(+)/minus(-) symbols associated with GO 
term(s) of interest are active and when chcked, the server will 
respond by showing only those proteins associated with the add/ 
omit GO terms. If the user interest in only proteins with "protein 
binding" as the Molecular Function annotation and "cell 
adhesion" as the Biological Process, then 21 sequences, including 
those for fimbriae and certain secreted serine protease transporters 
are displayed with their GO terms (Figure ID). Notably, many of 
these proteins, e.g., the 354* inputted protein, the serine protease 
pic autotransporter (GI: 68565646), have been associated with 
multiple possible subcellular localizations as documented in 
Q8CWC7 of UniProtKB/SwissProt database. And the CEL- 
L02G0 also successfully annotated the localization in outer 
membrane and extracellular localization when the protein was 
referred as single localization in original dataset. For most 
proteins, BLAST in CELL02G0 correctly annotated their 
cellular component ontology', and CELLO correcdy predicted its 
localization. If the "shift relation" button (top left in Fig. ID) is 
cUcked, other GO term-related proteins, e.g., flageUum and 
virulence proteins (from the original list of outputted proteins), are 
added to the list because either the Molecular Function GO-slim 
term "protein binding" or the Biological Process GO-slim term 
"cell adhesion" although not both were assigned to these proteins. 
By using the "shift relation" button, users can switch between an 
"either/ or" retrieval for "union" as opposed to an "and" retrieval 
for "intersection". Sometimes addition of more GO terms can be 
used to restrict the function or processes of interest, which may 
eliminate proteins with promiscuous functions. Certain proteins 
have generally defined GO-slim terms, e.g., those for the 369'*' 
inputted protein, bifunctional hemolysin/adenylate cyclase (GI: 
34978355). Notably, although hemolysin and cyclase have 
different functions, both proteins have the GO-slim defined 
molecular function "ion binding." 

At the same time, the incompleteness and disorderliness of GO 
based functional annotation for a single protein may occur due to 
insufficient assay experimentally and too much homologs identi- 
fied by BLAST, respectively. And both limit the effect of 
CELL02G0 usage. The later issue could be solved by justified 
the criteria of E-value stricdy. 

We also perform CELL02G0 on a dataset derived from a 
Gram-negative pathogenic bacterium Vibrio cholerae. The previous 
work [33] attempted to identifying some potential drug and 
vaccine candidates by using complex computational workflow 
based on comparative and subtractive genomic analysis strategy 
and pipelining multiple tools. Without carrying out huge 
computation to confirm unique proteins present in pathogen but 
absent in host, the CELL02G0 will respond by showing only 
those proteins associated with the added interest GO terms or omit 
GO terms with sharing function in host. And some GO terms 
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relative to pathogenic pathway can be further exploited in this 
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"isomerase activity" in Molecular Function ontology, "biosyn- 
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chemotaxis", which involving in flagellar motor, and the GO-slim 
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respectively. The current subcellular localization prediction tools 
and most existing functional annotation software do not provide 
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In summary, CELL02G0 can provide brief or detailed 
annotations of GO categories by combining CELLO localiza- 
tion-prediction and BLAST homology-searching approaches for 
single or multiple input sequences. When each protein sequence in 
a query dataset can be confidentially annotated, even though not 
all proteins in a query set have known localizations, CELL02G0 
quickly screens for as many localizations and GO annotations 
associated with the sequences and collects them as output. 
CELL02G0 should be a useful tool for research involving 
complex biological systems. 
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